Add docs for IOI by SeanNaren · Pull Request #948 · NVIDIA-NeMo/Skills

SeanNaren · 2025-10-15T11:05:03Z

Summary by CodeRabbit

Documentation
- Added a comprehensive IOI evaluation workflow covering IOI24/IOI25 contexts.
- Step-by-step data preparation, running evaluation, and results verification guidance.
- Example evaluation command set to 50 solutions per sub-task and notes for cluster/server use.
- Replaced older IOI subsection with a broader IOI-focused workflow.
- Removed an explicit benchmark-declaration line from the human-eval-infilling section.

coderabbitai · 2025-10-15T11:05:20Z

Walkthrough

Replaced the ioi24 subsection with a single IOI section documenting IOI24/IOI25: dataset preparation via ns prepare_data, ns eval usage with Slurm/local options and multi-solution settings (e.g., 50 solutions per subtask), and result verification. Removed a benchmark-definition line from human-eval-infilling.

Changes

Cohort / File(s)	Change summary
Documentation — IOI evaluation workflow `docs/evaluation/code.md`	Replaced the ioi24-specific subsection with a unified IOI section covering IOI24/IOI25: added data preparation steps (`ns prepare_data ioi24
Documentation — human-eval-infilling tweak `docs/evaluation/code.md`	Removed the explicit "Benchmark is defined" declaration line in the human-eval-infilling subsection, leaving only the original benchmark source link.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant NS as ns CLI
  participant DS as Dataset Store
  participant SL as Slurm Scheduler
  participant EV as Evaluator
  participant RS as Results/Logs

  rect rgb(235,245,255)
    note over U,NS: Data preparation (IOI24/IOI25)
    U->>NS: ns prepare_data --benchmark ioi24|ioi25 ...
    NS->>DS: fetch & prepare IOI artifacts
    DS-->>NS: prepared dataset path
    NS-->>U: prints prepared-data path
  end

  rect rgb(240,255,240)
    note over U,NS: Evaluation (multi-solution)
    U->>NS: ns eval --benchmark ioi24|ioi25 --num-solutions-per-subtask 50 --slurm|--local ...
    alt Slurm
      NS->>SL: submit evaluation jobs
      SL->>EV: start evaluator tasks
    else Local
      NS->>EV: run evaluator locally
    end
    EV->>DS: load prepared data
    EV->>EV: generate N solutions per subtask
    EV->>RS: write metrics, logs, artifacts
    RS-->>U: results path for verification
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nibble docs with nimble paws,
IOI24 and IOI25 join the cause,
Prepare the data, then eval the run,
Fifty solutions until scoring's done,
Carrots tallied, results delight — hop on! 🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The PR title "Add docs for IOI" is concise, clear, and directly related to the main change in the pull request. The changeset primarily focuses on adding and expanding documentation for IOI (both IOI24 and IOI25 support) in the evaluation code documentation file, which matches exactly what the title conveys. The title is specific enough that a developer scanning the commit history would immediately understand that this PR introduces IOI-related documentation, without being overly verbose or generic.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/ioi_docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

docs/evaluation/code.md (2)

378-405: Add shell language hints to command snippets.

Please tag these fenced blocks as bash (or shell) so rendered docs get syntax highlighting and downstream linters stop flagging them.

371-372: Use descriptive link text.

Replace bare “here” with something like “IOI24 dataset on HuggingFace” to satisfy MD059 and improve accessibility.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea29b1b and f73384b.

📒 Files selected for processing (1)

docs/evaluation/code.md (1 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.18.1)

docs/evaluation/code.md

367-367: Link text should be descriptive

(MD059, descriptive-link-text)

371-371: Link text should be descriptive

(MD059, descriptive-link-text)

377-377: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

389-389: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: pre-commit
GitHub Check: unit-tests

docs/evaluation/code.md

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

docs/evaluation/code.md (2)

371-372: Use descriptive link text for accessibility.

Replace the bare “here” link text with something meaningful like “IOI24 dataset on Hugging Face” so screen readers convey the destination. Based on static analysis hints

377-405: Annotate shell snippets with their language.

Please add a language like bash to the fenced code blocks so tooling and syntax highlighting work correctly. Based on static analysis hints

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f73384b and 8be70d9.

📒 Files selected for processing (1)

docs/evaluation/code.md (1 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.18.1)

docs/evaluation/code.md

367-367: Link text should be descriptive

(MD059, descriptive-link-text)

371-371: Link text should be descriptive

(MD059, descriptive-link-text)

377-377: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

389-389: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: unit-tests

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

docs/evaluation/code.md (3)

371-372: Use descriptive link text for accessibility.

Replace “here” with meaningful link text so screen readers convey where the URL leads.

-- Original benchmark source is [here](https://huggingface.co/datasets/open-r1/ioi).
+- Original benchmark source is the [Open-R1 IOI dataset on Hugging Face](https://huggingface.co/datasets/open-r1/ioi).

377-379: Add a language identifier to the CLI code fence.

Specify the shell language for proper syntax highlighting and lint compliance.

-```
+```bash
 ns prepare_data ioi24


---

`389-405`: **Add the shell language to the eval command fence.**

Mark the fence as bash to improve readability and satisfy markdown linting.

```diff
-```
+```bash
 ns eval \
     --cluster=<CLUSTER_NAME> \
     --model=nvidia/OpenReasoning-Nemotron-32B \
     --server_type=vllm \
     --server_args="--async-scheduling" \
     --server_nodes=1 \
     --server_gpus=8 \
     --benchmarks=ioi24:50 \
     --with_sandbox \
     --split=test \
     --data_dir=<DATA_DIR> \
     --output_dir=<OUTPUT_DIR> \
     --extra_eval_args="++eval_config.test_file=<PATH_TO_METADATA_TEST_FILE>" \
     ++inference.temperature=0.6 \
     ++inference.top_p=0.95 \
     ++inference.tokens_to_generate=65536


</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: CodeRabbit UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 8be70d991c74f44e1235ca5a96891df490dba36b and 04a6eede4e33a4f7291fb4d0c3f1ed092326d07d.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `docs/evaluation/code.md` (1 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

<details>
<summary>docs/evaluation/code.md</summary>

367-367: Link text should be descriptive

(MD059, descriptive-link-text)

---

371-371: Link text should be descriptive

(MD059, descriptive-link-text)

---

377-377: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

389-389: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)</summary>

* GitHub Check: pre-commit
* GitHub Check: unit-tests

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

docs/evaluation/code.md

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

docs/evaluation/code.md (4)
181-184: Add code block language specifiers and use descriptive link text.

The IOI section has markdown linting violations. The code block starting at line 189 is missing a language specifier, and link text at line 183 should be descriptive rather than "here".

Apply this diff:
 ### IOI

-We currently support IOI24 and are working to support IOI25 for evaluation. The original data for IOI24 can be seen [here](https://huggingface.co/datasets/open-r1/ioi).
+We currently support IOI24 and are working to support IOI25 for evaluation. The original data for IOI24 can be seen [in the open-r1/ioi dataset](https://huggingface.co/datasets/open-r1/ioi).
189-191: Specify code block language.

The fenced code block should declare a language for syntax highlighting.

Apply this diff:
 First, prepare the dataset by running the `ns prepare_data` command. The arguments below will generate `test.jsonl` and `test_metadata.json`.

-```
+```bash
 ns prepare_data ioi24
-```
+```
201-218: Specify code block language for the evaluation command.

The fenced code block should declare a language for syntax highlighting.

Apply this diff:
 This command runs an evaluation of [OpenReasoning-Nemotron-32B](https://huggingface.co/nvidia/OpenReasoning-Nemotron-32B) on a Slurm cluster.

-```
+```bash
 ns eval \
     --cluster=<CLUSTER_NAME> \
224-229: Specify code block language for the results output.

The fenced code block showing expected output should declare a language for consistency.

Apply this diff:
 After all jobs are complete, you can check the results in `<OUTPUT_DIR>/eval-results/ioi24/metrics.json`. You can also take a look at `<OUTPUT_DIR>/eval-results/ioi24/summarized-results/main_*`. They should look something like this:

-```
+```text
 ------------------------------------------------------ ioi24 ------------------------------------------------------

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 04a6eed and 55306fc.

📒 Files selected for processing (1)

docs/evaluation/code.md (2 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.18.1)

docs/evaluation/code.md

183-183: Link text should be descriptive

(MD059, descriptive-link-text)

189-189: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

201-201: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

224-224: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

414-414: Link text should be descriptive

(MD059, descriptive-link-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: unit-tests

🔇 Additional comments (1)

docs/evaluation/code.md (1)

411-414: Verify intentional removal of benchmark definition line in human-eval-infilling.

The human-eval-infilling section now differs structurally from other benchmarks: the "Benchmark is defined in..." line was removed, leaving only the original benchmark source link. Confirm this removal was intentional and aligns with documentation standards.

docs/evaluation/code.md

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

docs/evaluation/code.md (4)

180-190: Add language specifier to code block and use descriptive link text.

Per markdown best practices, code blocks should specify the language and links should use descriptive text instead of "here".

-We currently support IOI24 and are working to support IOI25 for evaluation. The original data for IOI24 can be seen [here](https://huggingface.co/datasets/open-r1/ioi).
+We currently support IOI24 and are working to support IOI25 for evaluation. The original data for IOI24 can be seen in the [open-r1/ioi dataset](https://huggingface.co/datasets/open-r1/ioi).

 #### Data Preparation

 First, prepare the dataset by running the `ns prepare_data` command. The arguments below will generate `test.jsonl` and `test_metadata.json`.

-```
+```bash
 ns prepare_data ioi24


---

`200-217`: **Add language specifier to ns eval command block.**

Specify `bash` as the language for the code block to improve formatting and readability.

```diff
-```
+```bash
 ns eval \
     --cluster=<CLUSTER_NAME> \

223-228: Add language specifier to results output block.

Specify the language for the code block to maintain consistency with other sections.

-```
+```
 ------------------------------------------------------ ioi24 ------------------------------------------------------
 evaluation_mode   | num_entries | avg_tokens | gen_seconds | correct       | total_score        | round_robin_score

410-413: Use descriptive link text instead of "here".

Replace the generic "here" with text that describes the link destination.

-
-- Original benchmark source is [here](https://github.com/openai/human-eval-infilling).
+
+- Original benchmark source is at [openai/human-eval-infilling](https://github.com/openai/human-eval-infilling).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55306fc and 0c447fa.

📒 Files selected for processing (1)

docs/evaluation/code.md (2 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.18.1)

docs/evaluation/code.md

182-182: Link text should be descriptive

(MD059, descriptive-link-text)

188-188: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

200-200: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

223-223: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

413-413: Link text should be descriptive

(MD059, descriptive-link-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: unit-tests

🔇 Additional comments (1)

docs/evaluation/code.md (1)

180-229: Excellent IOI documentation additions.

The new IOI section is well-structured and comprehensive, covering data preparation, evaluation with multi-solution settings, and results verification. It directly addresses previous feedback about including expected output examples. The typo fix (METADATA_TEST_FILE) from the earlier review has been properly applied. The section follows established patterns from other benchmarks and provides clear, actionable instructions for users.

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: Sean Naren <sean.narenthiran@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

docs/evaluation/code.md (5)
182-182: Make link text descriptive.

Use descriptive, meaningful text that clearly indicates the link destination. Avoid generic phrases that provide no context about where the link leads. Replace the generic "here" with text that describes the link target, e.g., "IOI24 dataset on Hugging Face" or similar.
-The original data for IOI24 can be seen [here](https://huggingface.co/datasets/open-r1/ioi).
+The original data for IOI24 can be seen in the [open-r1 IOI dataset](https://huggingface.co/datasets/open-r1/ioi).
188-190: Add language identifier to code block.

Specify bash as the language for syntax highlighting.
-```
+```bash
 ns prepare_data ioi24
-```
+```
200-217: Add language identifier to code block.

Specify bash as the language for syntax highlighting.
-```
+```bash
 ns eval \
     --cluster=<CLUSTER_NAME> \
     --model=nvidia/OpenReasoning-Nemotron-32B \
-```
+```
223-228: Add language identifier to code block.

Specify a language (e.g., text or plain) for the example output block.
-```
+```text
 ------------------------------------------------------ ioi24 ------------------------------------------------------
 evaluation_mode   | num_entries | avg_tokens | gen_seconds | correct       | total_score        | round_robin_score
 pass@1[avg-of-50] | 39          | 40387      | 7410        | 0.51% ± 1.04% | 303.47             | 261.01
 pass@50           | 39          | 40387      | 7410        | 2.56%         | 303.47             | 261.01
-```
+```
413-413: Make link text descriptive.

Use descriptive, meaningful text that clearly indicates the link destination. Avoid generic phrases that provide no context about where the link leads. Replace the generic "here" with text describing the benchmark source, e.g., "human-eval-infilling repository" or similar.
-- Original benchmark source is [here](https://github.com/openai/human-eval-infilling).
+- Original benchmark source is the [human-eval-infilling repository](https://github.com/openai/human-eval-infilling).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c447fa and ca6d3d8.

📒 Files selected for processing (1)

docs/evaluation/code.md (2 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.18.1)

docs/evaluation/code.md

182-182: Link text should be descriptive

(MD059, descriptive-link-text)

188-188: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

200-200: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

223-223: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

413-413: Link text should be descriptive

(MD059, descriptive-link-text)

🔇 Additional comments (1)

docs/evaluation/code.md (1)

180-229: Comprehensive IOI documentation with clear workflow.

The section provides a complete walkthrough: data preparation, evaluation execution with a realistic example, and result verification including expected output. This addresses prior feedback effectively and gives users clear guidance on IOI24/IOI25 evaluation.

Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Sean Naren <sean.narenthiran@gmail.com>

Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Sean Naren <sean.narenthiran@gmail.com> Signed-off-by: dgitman <dgitman@nvidia.com>

coderabbitai bot reviewed Oct 15, 2025

View reviewed changes

docs/evaluation/code.md Outdated Show resolved Hide resolved

coderabbitai bot reviewed Oct 15, 2025

View reviewed changes

Kipok reviewed Oct 15, 2025

View reviewed changes

docs/evaluation/code.md Outdated Show resolved Hide resolved

docs/evaluation/code.md Outdated Show resolved Hide resolved

coderabbitai bot reviewed Oct 18, 2025

View reviewed changes

Kipok reviewed Oct 20, 2025

View reviewed changes

docs/evaluation/code.md Outdated Show resolved Hide resolved

SeanNaren commented Oct 22, 2025

View reviewed changes

docs/evaluation/code.md Outdated Show resolved Hide resolved

coderabbitai bot reviewed Oct 22, 2025

View reviewed changes

SeanNaren added 5 commits October 22, 2025 09:24

add docs

6a2d790

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

fix spelling

266bc7b

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

missing arg

477cdab

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

address feedback

47e2b36

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Update docs/evaluation/code.md

ca6d3d8

Signed-off-by: Sean Naren <sean.narenthiran@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

SeanNaren force-pushed the feat/ioi_docs branch from 0c447fa to ca6d3d8 Compare October 22, 2025 16:25

coderabbitai bot reviewed Oct 22, 2025

View reviewed changes

Kipok approved these changes Oct 23, 2025

View reviewed changes

Kipok merged commit c8252d5 into main Oct 23, 2025
6 checks passed

Kipok deleted the feat/ioi_docs branch October 23, 2025 19:23

dgtm777 pushed a commit that referenced this pull request Oct 29, 2025

Add docs for IOI (#948)

235f5cc

Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Sean Naren <sean.narenthiran@gmail.com>

coderabbitai bot mentioned this pull request Dec 4, 2025

Port ICPC changes to IOI #1046

Merged

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Add docs for IOI (#948)

5d2ba8e

Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Sean Naren <sean.narenthiran@gmail.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add docs for IOI#948

Add docs for IOI#948
Kipok merged 5 commits intomainfrom
feat/ioi_docs

SeanNaren commented Oct 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 15, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SeanNaren commented Oct 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SeanNaren commented Oct 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 15, 2025 •

edited

Loading